Statistics for Political Science
August 5, 2025
R Patrick Buhr
Fifth-year Ph.D. Candidate …
Mason Auten
Third-year Ph.D. Student …
You are interested in asking and answering questions about politics.
Quantitative methods can answer each of these questions.
Foundation of quantitative analysis:
- Based on numerical measurements (i.e., what is the quantity?)
- Used to develop and test theories that are generalizable.
- Creates measurements and analyses that are replicable.
Quantitative analysis is one of three dominant paradigms in political science research; the other two are qualitative analysis and formal modeling.
No single method is inherently superior, but quantitative methods are currently dominant in political science and offer the best opportunities for academic and private sector employment.
Description: Summarizing and understanding the characteristics of the data we have.
Inference: Making generalizations about a population based on sample data.
Prediction: Forecasting future events or behaviors based on existing data patterns.
For quantitative analysis, we need to operationalize a concept into a numerical representation.
Occasionally a variable will lend itself well to quantification: income, voter turnout, number of bills introduced by a Member of Congress, hours of cable television a person watches.
Other times, quantification is easy but there is dispute over which is the best measure: GDP, GDP per Capita, Human Development Index
“Likert” scale, 7-point:
1. Very Liberal
2. Liberal
3. Somewhat Liberal
4. Moderate/Middle of the Road
5. Somewhat Conservative
6. Conservative
7. Very Conservative
Measures both direction (Liberal vs. Conservative) as well as intensity (Somewhat vs. Very).
Can we use the same Likert scale?
On a scale of 1 to 5, with 1 being “strongly disagree” and 5 being “strong agree” how much do you agree with the following statements:
Scores are averaged to create a composite racial resentment index.
Every quantitative paper in political science answers some variation of the following question:
How (or why) does \(x\) affect \(y\)?
Every causal statement has the counterfactual of “if \(x\) had been different, then \(y\) would have been different too.”
We study units.
Units have attributes.
Variables are logical groupings of mutually exclusive attributes.
Dataframes are a structured way to organize variables.
In this course, we will only use dataframes. The major alternative to dataframes is lists, which are used primarily in engineering and computer science.
Definition: Nominal variables classify data into distinct categories without any inherent order or ranking.
Examples:
Definition: Ordinal variables have categories that can be ranked in a meaningful order, but the differences between categories are not necessarily equal.
Examples:
Definition: Continuous (sometimes called numeric) variables have ordered categories with equal intervals between values.
Examples:
Definition: Binary (sometimes called Boolean) variables are either 1 or 0 (or TRUE/FALSE)
Examples:
What type of variable is the following (categorical, ordinal, continuous, binary):
Religious affiliation (e.g. Catholic, Protestant, Muslim, Jewish, None).
Categorical, because categories are not ranked.
Unemployment rate in a Member of Congress’s district (percentage of constituents who are unemployed)
Continuous, because distance between numbers is consistent.
Political interest (e.g. not interested, slightly interested, fairly interested, very interested)
Ordinal, because categories have a ranking
Often, we can represent concepts in different ways.
What would be the best way to operationalize education?
Best choice is based on your theory and research design.
R is a powerful and widely used programming language for statistical analysis.
It is free, open-source, and widely supported by researchers and data analysts.
R provides extensive libraries for data visualization, regression modeling, and machine learning.
Learning R enhances your ability to conduct independent research and analyze real-world data.
Principle One: “Write code for humans.”
#.Principle Two: “Let the computer do the work”
Types of problems:
Solving problems:
ChatGPT and other LLMs are tools. Being able to use them well will be an essential part of your workflow in the future.
However, like any tool, it is exceptionally easy to misuse ChatGPT if you don’t know what you are doing.
Use ChatGPT for learning, to clarify concepts, and to troubleshoot your own code—but do not use it to replace your own analytical thinking.
Your ability to think clearly and rigorously about the underlying statistics is what will set you apart.
Class Policy: you may use ChatGPT to help troubleshoot, debug, or clarify your code. If you do so, you must submit the text of your chat as part of your assignment.
How to get to Carnegie Hall? Practice, practice, practice.
Graduate education is different than undergrad:
Treat this like a job: start work at 9:00AM and work until at least 5:00PM.